Recording meeting audio via multiple individual smartphones

ABSTRACT

A method of providing audio information from a meeting includes receiving a first audio stream from a first input audio device and a second audio stream from a second input audio device during the meeting, identifying a first audio fragment from the first audio stream, and identifying a second audio fragment from the second audio stream. The method also includes compiling the audio fragments from the first and second audio streams into an audio file that includes at least the first audio fragment and the second audio fragment. The method further includes providing the audio file to one or more recipients. The audio file identifies the first audio fragment as corresponding to a first participant of the meeting and the second audio fragment as corresponding to a second participant of the meeting.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/904,434, filed on Jun. 17, 2020, entitled, “Recording Meeting AudioVia Multiple Individual Smartphones,” which is a division of U.S.application Ser. No. 16/231,056, filed on Dec. 21, 2018 (now U.S. Pat.No. 10,701,482), entitled, “Recording Meeting Audio Via MultipleIndividual Smartphones,” which is a continuation of U.S. applicationSer. No. 15/214,559, filed on Jul. 20, 2016 (now U.S. Pat. No.10,171,908), entitled, “Recording Meeting Audio Via Multiple IndividualSmartphones,”, which claims priority to U.S. Prov. App. No. 62/197,249,filed on Jul. 27, 2015, entitled “Recording Meeting Audio Via MultipleIndividual Smartphones”, all of which are incorporated herein byreference.

TECHNICAL FIELD

This application is directed to the field of recording and processingaudio, and more particularly to the field of recording and organizingaudio streams recorded on multiple smartphones of meeting participants.

BACKGROUND OF THE INVENTION

Inefficiency of business meetings poses a major challenge for enterpriseand personal productivity. According to expert estimates, millions ofcorporate and organizational meetings occur in the United States daily.On average, employees spend 20-40% of their time in the meetings, whileupper managers dedicate up to 80-90% of their time resources tomeetings.

Multiple surveys agree in their findings that only a relatively smallpercent of the meetings (45-50%, by some estimates) are consideredefficient by the majority of their participants. Accordingly, there is ashared belief among time management professionals that nearly 50% ofemployee time in the meetings is wasted. This is confirmed by manypolls: thus, 47% of 3,200 employees participating in a 2012 workefficiency survey identified corporate meetings as the single leadingtime-wasting factor. It is estimated that inefficiency of corporatemeetings causes multi-billion dollar business losses every year.

Experts in various areas of business efficiency are dissecting problemsfacing meeting productivity in many different ways. Overall, asignificant number of researchers agree that two factors related toenterprise information management rank top on the list of issues causingextensive waste of time at and between meetings: the first factor isinsufficient preparedness by meeting organizers, participants andleaders and the second factor is poor compilation and distribution ofmeeting materials. According to one survey on meeting efficiency, fortwo thirds of repetitive business meetings, no relevant materials aredistributed to meeting participants, who are left guessing about themeeting specifics and remain confused about decisions made at themeetings. The same survey has discovered that around 63% of repetitivemeetings do not always have meeting minutes produced, while almost 40%of the existing meeting minutes take over a week to get delivered to thetarget audience. Subsequently, meeting results and follow-up actionsalso remain unknown to many participants and other involved personnel.These shortcomings in organization and follow-up of business meetingscall for a significant improvement in creating and distributing adequatemeeting materials.

Smartphones have long become pervasive mobile devices and are playing anincreasing role in scheduling and tracking business meetings. It can besafely assumed that for the vast majority of modern corporate andorganizational meetings, nearly 100% of their participants own one oranother model of a smartphone and bring their mobile device (or multipledevices) to every meeting. Smartphones and other mobile devices such asphablets or larger tablet devices are used by meeting participants on aregular basis to receive scheduling information, access meeting agenda,capture various visual aspects of meetings via smartphone cameras, suchas whiteboard content, take typed and handwritten notes, etc. Someparticipants also record meeting audio on their smartphones.

Notwithstanding significant progress in capturing meeting materials, animportant component of in-person and remote meetings, an adequate audiorecording, remains underdeveloped. Only a small number of conferencefacilities are equipped with high-end audio conferencing and recordingequipment, including distributed microphones. But even with suchequipment, meeting speech segmentation and speaker diarization(determining who spoke when) represent difficult tasks and complicateprocessing audio recordings of meetings by their recipients.Accordingly, post-processing of recorded audio materials, adding voiceand other annotations, handling simultaneous talk by severalparticipants and voice-to-text transcription of meeting results,compiling a consistent storyline for distribution to participants andothers may become a difficult and time consuming task and may furthercontribute to delays in meeting follow-up.

Accordingly, it is desirable to provide high quality audio recording ofmeeting without expensive equipment, allowing speaker identification,easy annotation and voice-to-text conversion.

SUMMARY OF THE INVENTION

According to the system described herein, recording audio informationfrom a meeting includes determining which of a plurality of specificpersonal audio input audio devices correspond to which specific meetingparticipants, measuring volume levels at each of the personal audioinput devices in response to each of the meeting participants speaking,identifying that a first particular one of the participants is speakingbased on stored voice profiles and/or relative volume levels at each ofthe personal audio input devices, recording on a first channel audioinput at a first one of the personal audio input audio devicescorresponding to the first particular speaker, identifying that a secondparticular one of the participants is speaking based on stored voiceprofiles and/or relative volume levels at each of the personal audioinput devices, recording on a second channel audio input at a second oneof the personal audio input audio devices corresponding to the secondparticular speaker, the first and second channels being separate fromeach other, and merging the first and second channels to provide astoryboard that includes audio input from the channels andidentification of speakers based on which specific ones of the channelscontains the audio input. Determining which of the plurality of specificpersonal audio input audio devices correspond to which of the meetingparticipants may be based on which of the meeting participants ownswhich of the specific personal audio input audio devices. At least someof the specific personal audio input audio devices may be smartphones.Recording audio information from a meeting may also include linking theplurality of specific personal audio input audio devices prior to themeeting and equalizing sound detection levels in the personal audioinput audio devices. Recording audio information from a meeting may alsoinclude simultaneously recording audio input at the first one of thepersonal audio input audio devices and on the first channel and audioinput at the second one of the personal audio input audio devices and onthe second channel in response to the first and second meetingparticipants speaking at the same time. Recording audio information froma meeting may also include filtering the audio input at the firstchannel and the second channel to separate speech by the firstparticipant from speech by the second participant. Filtering the audioinput may be based on a distance related volume weakening coefficient,signal latency between the personal audio input devices, and/or ambientnoise. Recording audio information from a meeting may also includeproviding additional voice annotations on the storyboard. The additionalannotations may be provided following the meeting. The additionalannotations may be provided by one of the participants. The additionalannotations may be related to specific speech fragments and/orcommentary for an entire meeting. Recording audio information from ameeting may also include providing pre-recorded introductions on thestoryboard for at least some of the meeting participants. Visualidentification may be provided on a particular one of the personal audioinput audio devices identified as corresponding to a current speaker. Ameeting participant may confirm on the particular one of the personalaudio input audio devices whether a corresponding one of the meetingparticipants is the current speaker. At least a portion of thestoryboard may be transcribed using voice-to-text transcription. Atleast some of the meeting participants may be in remote offices.

According further to the system described herein, a non-transitorycomputer-readable medium contains software that records audioinformation from a meeting. The software includes executable code thatdetermines which of a plurality of specific personal audio input audiodevices correspond to which specific meeting participants, executablecode that measures volume levels at each of the personal audio inputdevices in response to each of the meeting participants speaking,executable code that identifies that a first particular one of theparticipants is speaking based on at least one of: stored voice profilesand relative volume levels at each of the personal audio input devices,executable code that records on a first channel audio input at a firstone of the personal audio input audio devices corresponding to the firstparticular speaker, executable code that identifies that a secondparticular one of the participants is speaking based on stored voiceprofiles and/or relative volume levels at each of the personal audioinput devices, executable code that records on a second channel audioinput at a second one of the personal audio input audio devicescorresponding to the second particular speaker, the first and secondchannels being separate from each other, and executable code that mergesthe first and second channels to provide a storyboard that includesaudio input from the channels and identification of speakers based onwhich specific ones of the channels contains the audio input.Determining which of the plurality of specific personal audio inputaudio devices correspond to which of the meeting participants may bebased on which of the meeting participants owns which of the specificpersonal audio input audio devices. At least some of the specificpersonal audio input audio devices maybe smartphones. The software mayalso include executable code that links the plurality of specificpersonal audio input audio devices prior to the meeting and equalizessound detection levels in the personal audio input audio devices. Thesoftware may also include executable code that simultaneously recordsaudio input at the first one of the personal audio input audio devicesand on the first channel and audio input at the second one of thepersonal audio input audio devices and on the second channel in responseto the first and second meeting participants speaking at the same time.The software may also include executable code that filters the audioinput at the first channel and the second channel to separate speech bythe first participant from speech by the second participant. Filteringthe audio input may be based on a distance related volume weakeningcoefficient, signal latency between the personal audio input devices,and/or ambient noise. The software may also include executable code thatfacilitates providing additional voice annotations on the storyboard.The additional annotations may be provided following the meeting. Theadditional annotations may be provided by one of the participants. Theadditional annotations may be related to specific speech fragmentsand/or commentary for an entire meeting. The software may also includeexecutable code that facilitates providing pre-recorded introductions onthe storyboard for at least some of the meeting participants. Visualidentification may be provided on a particular one of the personal audioinput audio devices identified as corresponding to a current speaker. Ameeting participant may confirm on the particular one of the personalaudio input audio devices whether a corresponding one of the meetingparticipants is the current speaker. At least a portion of thestoryboard may be transcribed using voice-to-text transcription. Atleast some of the meeting participants may be in remote offices.

The proposed system records meeting audio on individual smartphones ofmeeting participants while providing automatic diarization and speakeridentification for every speaker change; a master recording of eachfragment is provided by a principal smartphone belonging to the speaker(or to each of the speakers in case of double-talk). Subsequently, afinal arrangement of fragments into a storyline is compiled fromrecordings by principal smartphones. The system partially or completelyclears double-talk episodes from fragments of recording usingcross-recording by principal smartphones. The system also providesenhanced voice annotation and voice-to-text conversion capabilities,taking advantage of speaker identification, and may compile voiceannotations into the storyline.

System functioning includes the following:

1. At a start of a meeting, the system establishes connections betweensmartphones of participants; connection types may depend on the systemimplementation and configuration and may include peer-to-peerconnections, as well as various types of client-server connections thatmay use local or cloud server(s).

2. The system may change audio settings on phones of meetingparticipants to level (equalize) recording characteristics of differentphones. Initially, the system may do this at the start of the meetingand may continue fine-tuning phones as the meeting progresses and asacoustic parameters of the meeting room are revealed and possiblychange.

3. All smartphones of meeting participants may be permanently placedinto recording modes. A natural pre-condition of system functioning isthat each participant is using the smartphone belonging to thatparticipant as a personal device. Accordingly, a particular smartphoneof a meeting participant is residing significantly closer an owner ofthe particular smartphone than smartphones of other participants.Therefore, sound volume received by the smartphone of the speakingperson is significantly higher than the sound volume received bysmartphones of other participants (with possible normalization for themaximum recording volumes accepted by different devices and aftercalibration of each phone to normal talking volume of its owner).

4. The difference in reception volumes by various phones ofparticipants, averaged over short periods of time, may be propagated tothe system and may drive diarization of the current fragment of theaudio recording, i.e. speaker identification. In an embodiment, apairwise connection graph between smartphones of participants may beused as a speaker identification model. For example, if a node of theconnection graph may be detected such that min (ΔV)>0 for each edgeincidental with the node, where ΔV is a difference of an averagedreception volume between the smartphone in the detected node and theother node incidental to the edge, then the participant and thesmartphone corresponding to the detected node may serve as candidates.respectively, for the current speaker and a principal recording phone,as explained elsewhere herein.

5. Once the current speaker is identified, a particular smartphone ofthe speaker is marked by the system as a principal recording device andthe system tracks a corresponding fragment of audio recording by thatparticular smartphone until a sufficiently long pause when the speakereither stopped talking to change the subject or for other reason oruntil the current speaker is replaced by another speaker. In eithercase, the fragment is picked by the system and added to the currentchannel of the speaker. Each channel of each speaker may thereforeinclude subsequent fragments by a single speaker, uninterrupted byothers and separated by pauses (such fragments may of course may bemerged during post-processing of the meeting recording) or fragmentsseparated in time by audio fragments from other speakers recorded intheir channels. In addition to volume characteristics, latency of signalreception and explicit voice ID or audio profile of each speaker,generated by voice identification or voice recognition systems andstored in the system or on individual smartphones, may be used tofurther verify speaker identity and improve diarization. Note that othersmartphones may remain in permanent recording modes at all times andtherefore record the audio stream of each speaker, albeit with lowervolume and clarity. However, recording fragments of other speakers byother smartphones may be discarded by the system and may not be added toany of the speaker channels.

6. In the event of double-talk when two or more speakers talksimultaneously for a period of time, the system may initially identifyeach speaker, as explained above, and record double-talk on allprincipal smartphones owned by current speakers. After a double talkepisode has ended, the system may attempt clearing each recordedfragment from double-talk by non-owners prior to placing it into thecorresponding speaker channel. Such clearing may be facilitated bysimultaneous processing of recorded fragments from all principal phonesengaged in the double-talk.

For example, in case of two simultaneous speakers John (J) and Helen(H), the two signals recorded by principal phones may be schematicallypresented as:

J ₂(t)+αH ₁(t−β)+A ₁ (John's chanel)

H ₁(t)+αJ ₂(t−β)αA ₂ (Helen's channel)

where J₂(t) and H₁ (t) are a second speaking fragment in John's channeland a first speaking fragment in Helen's channel; α is a distancerelated volume weakening coefficient; β is signal latency between thetwo phones; and A₁, A₂ are ambient noises recorded by the first and thesecond speaker's smartphones.

The availability of two symmetric cross-recordings may facilitateassessing the coefficients (after an initial cancellation of ambientnoises A₁, A₂) and filtering out the weaker components using, forexample, echo cancelation technique. Even if the double-talk suppressionprocess has not fully succeeded, each channel unambiguously represents acorresponding speaker and any mix of speaker voices may be instantlyidentified in a full record by referring to the simultaneous recordingby other principal phone(s), i.e. by switching channels of simultaneousspeakers.

7. In the first minutes of a new meeting, the system may function inlearning mode, adapting to configuration of smartphones of meetingparticipants and to a specific acoustic environment of a meeting place.Accordingly, the system may provide and request feedback from meetingparticipants. For instance, a visual signal (blinking, color change,other display effects) may appear on a screen of a primary recordingdevice. If the device has been chosen incorrectly, the participant mayrespond to the system—for example, by tapping on a touchscreen, toindicate an error, without interrupting the meeting, and the system maycontinue searching for the right primary device, narrowing the possiblechoices. This may lead to swapping recorded fragments between channelsdue to change of identified speaker and may affect recording qualitythrough a learning period.

8. Recipients of an audio recording of the meeting may replay therecording in a variety of ways and may compile one or multiplestoryboards from all or certain subsets of speaker channels andfragments, emphasizing or de-emphasizing certain speakers and/orfragments and potentially grouping speakers/fragments by topics.

9. Meeting participants and recipients of the recording may add voiceannotations to particular speaker fragments or to the whole recording.The system may initially record such comments in separate channelsopened for each commenter (after a commentator identifies himself) andmay establish references between annotations and initial fragments.Subsequently, the system may automatically or semi-automatically,directed by commenters, compile storyline(s) that include speakingfragments of original participants, combined with annotations.

10. Additionally, where sound quality recorded by principal phones issufficient, speech recognition technologies and software forvoice-to-text conversion may be used for creating transcriptions.Voice-to-text applications may additionally benefit from explicit voiceprofiles of identified speakers recorded on devices of the speakers ormade available otherwise.

The system may offer a range of tools, applications and services formanipulating, cross-referencing, enhancing, annotating, compilingstorylines, voice-to-text conversion and distribution of meetingrecordings in their original and/or modified forms and contribute toadequate and timely distribution of meeting content. The system may becombined with other media, such as photographs of whiteboards withmeeting brainstorms, accompanying documents, email correspondence, etc.

In addition to a single meeting room configuration, the system may beused for remote meetings in several locations. In this case, additionalaudio sources, such as speakers to listen to remote participants, may bepresent in some or all remotely connected meeting rooms and some of theremote participants may be identified only within a physical meetingroom containing remote participants, while remaining unidentified inother meeting rooms until the end of the meeting; the remoteparticipants may be subsequently identified on the meeting storyline,which may also include indications of meeting rooms with each speakingfragment.

The system may also track meeting participants who do not havesmartphones in close proximity, temporarily or for the duration of themeeting. For example, a new participant without a smartphone may join ameeting and may be detected by a closest phone but not necessarilyidentified personally as one of the participants with a Voice ID and/orknown profile. Identification of the new participant may remain unknownor may be explicitly added at a storyline compilation time. Anothersimilar situation occurs when a user leaves a smartphone at a desk andmoves around a meeting room—for example, draws on a whiteboard orcontrols a projector. In this case, a participant speaking away from apersonal smartphone may be tracked by a closest smartphone of anotherparticipant but may be identified not as the owner of the closestsmartphone; to provide such identification, the system may poll voiceIDs and profiles of all participants in the meeting room.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the system described herein will now be explained in moredetail in accordance with the figures of the drawings, which are brieflydescribed as follows.

FIG. 1 is a schematic illustration of speaker identification, accordingto an embodiment of the system described herein.

FIG. 2 is a schematic illustration of speaker channels and of handlingdouble-talk episodes, according to an embodiment of the system describedherein.

FIG. 3 schematically illustrates storyline compilation, post-meetingannotation and voice-to-text features, according to an embodiment of thesystem described herein.

FIG. 4 is a system flow diagram illustrating system functioning,according to an embodiment of the system described herein.

DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

The system described herein provides a mechanism for recording meetingaudio on multiple individual smartphones of meeting participants,automatic speaker identification, handling double-talk episodes,compiling meeting storyline from fragments recorded in speaker channels,including post-meeting voice annotations, and optional voice-to-textconversion of certain portions of recording.

FIG. 1 is a schematic illustration 100 of speaker identification.Meeting participants 110, 120, 130 come to a meeting with personalsmartphones 140, 150, 160 (possibly running different mobile operatingsystems), all capable of audio recording and having software, explainedelsewhere herein, installed. Note that, other personal audio inputdevices may be used in place of one or more of the smartphones 140, 150,160, such as a tablet, a dedicated recording device, etc. A connectiongraph 170 is used to identify a current speaker. In FIG. 1, theparticipant 110 starts speaking and each of the smartphones 140, 150,160 receives voice signals 170 a, 170 b, 170 c of different volume, asschematically depicted by line thickness decreasing for the smartphones150, 160 located further from the speaker 110. The system averagessignal volumes received by each of the smartphones 140, 150, 160 overshort periods of time and builds deltas of average volumes for each edgeof the connection graph 170. If a condition min (ΔV)>0 is satisfied,where the minimum is taken for all edges starting at the node 140, thenthe participant 110 who is an owner of the smartphone 140 is marked as acandidate for an active speaker. Additional conditions may be checkedfor verification; for example, unique voice characteristics may beextracted from the signal and compared with a stored value of Voice IDon the smartphone 140 of the participant 110, as illustrated by acriteria 180.

If all conditions and checks for a current speaker are satisfied, theparticipant 110 is marked as an active speaker and the smartphone 140 ofthe participant 110 is marked as a principal recording device andbecomes a designated one of the smartphones 140, 150, 160 recording avoice stream of the participant 110, as schematically shown on thescreen of the smartphone 140. A channel of the participant 110 isactivated (or created if the participant 110 speaks for the first timein the meeting) and a fragment of an audio recording of the participant110 is added to the channel after a pause or speaker change, asexplained elsewhere herein.

FIG. 2 is a schematic illustration 200 of storyline compilation,post-meeting annotation and voice-to-text features. Each of the meetingparticipants 110, 120, 130 has been an active speaker at some timeduring the meeting; accordingly, channels 210, 220, 230 corresponding tothe participants 110, 120, 130 have been created by the system and keptas audio fragments 240 of active speakers. When a fragment ofdouble-talk is identified, the fragment may be recorded on more than oneprincipal device, as illustrated by double-talk fragments 250 a, 250 b.Even though audio signals recorded by the two principal devicesrepresent the same conversation, the audio signals may not be identical,as explained elsewhere herein and illustrated by audio signal profilefunctions 260 a, 260 b. The system may attempt to resolve double talkfragment and retrieve individual fragments assignable to each activespeaker channel by applying various filtering techniques 270, such asLMS filtering. If successful, separate fragments 280 a, 280 b may beadded to their respective channels. Otherwise, double-talk recorded oneach principal recording device may be added to the correspondingchannel, all double-talk fragments may be cross-referenced andswitchable between channels.

FIG. 3 is a schematic illustration 300 of storyline compilation,post-meeting annotation and voice-to-text features. The three channels210, 220, 230 of the original meeting participants 110, 120, 130correspond to the fragments 240, 280 a, 350, 280 b of audio recording byeach active speaker during the meeting. A commenter 310 listens to therecording of the meeting and decides to add voice comments to severalfragments. A separate channel 320 is created for the commenter 310 andmaintains a voice annotation 330 for a fragment 240 by the participant110 and another voice annotation 340 for a fragment 280 b by the speaker130, retrieved from a double-talk, as explained elsewhere herein. Voiceannotations may also refer to a particular topic or a meeting as awhole.

A storyline 350 of a meeting may be compiled from original audiofragments for meeting participants recorded during the meeting, combinedwith voice annotations and other components, such as pre-recordedintroductions of each speaker and organized chronologically, by topicsor otherwise. For example, the storyline 350 may be organized in achronological order of speaker fragments, with the addition of voiceannotations immediately after annotated fragments. Such storylines maybe distributed as key meeting materials shortly after the end of themeeting.

Some of the recorded audio fragments may be converted to text usingvoice-to-text technologies. In FIG. 3, a fragment 360 by the participant110 is automatically transcribed. To facilitate voice recognition, avoice profile 370 may be extracted from a device of the participant 110(or may be stored in the system for the commentator 310) and used by avoice-to-text system 380 to create transcribed text 390.

Referring to FIG. 4, a system flow diagram 400 illustrates processing inconnection with recording a meeting on multiple individual phones. Notethat the processing for the system described herein may be provided byone or more of the smartphones 140, 150, 160, by a smartphone (orsimilar device) of a non-participant, a separate computing device (e.g.,desktop computer), in connection with a cloud service (or similar)coupled to one or more of the smartphones 140, 150, 160, etc.

Processing begins at a step 410, where the system establishesconnections between smartphones of participants and/or with a local orcloud service run by the system. The system may also ensure thatsoftware for the system is running on each smartphone of eachparticipant and that a recording mode on each smartphone is enabled.After the step 410, processing proceeds to a step 415, where a meetingparticipant speaks. After the step 415, processing proceeds to a step420, where the system measures average volume of an audio signal overshort periods of time and delay of the audio signal on each smartphone,as explained elsewhere herein (see in particular FIG. 1 and theaccompanying text). After the step 420, processing proceeds to a step425, where the system calculates deltas of average volumes received bydifferent smartphones of the participants over a connectivity graph, asexplained in more details in conjunction with FIG. 1.

After the step 425, processing proceeds to a step 430, where a candidatefor the current speaker is detected according to specific criteria, asexplained elsewhere herein. After the step 430, processing proceeds to astep 435, where the system runs an additional speaker identificationcheck, as explained in conjunction with FIG. 1 and speakeridentification criteria 180 explained elsewhere herein. After the step435, processing proceeds to a step 440 where the system designates andmarks a principal recording smartphone (i.e., the only smartphone thatrecords the audio fragment from the current speaker until conditionschange, as explained elsewhere herein). After the step 440, processingproceeds to a test step 445, where it is determined whether double-talkis detected by the system. If so, processing proceeds to a step 450,where the system marks the starting time stamp of a double-talk fragmentand designates principal smartphones for recording the fragment. Afterthe step 450, processing proceeds to a test step 455, where it isdetermined whether any of the speakers stopped talking (note that thetest step 455 can be independently reached from the test step 445 ifdouble-talk has not been identified). If not, processing proceeds to astep 460 where the system and the principal recording smartphone(s)continue capturing the current fragment.

After the step 460, processing proceeds back to the test step 455. If itwas determined at the test step 455 that any of the current speakersstopped talking, processing proceeds to a step 465, where a recordedspeaker fragment from principal smartphones is added to thecorresponding speaker channels, as explained elsewhere herein (see FIGS.2, 3 and the accompanying text). After the step 465, processing proceedsto a test step 470, where it is determined whether any speaker istalking. If so, processing proceeds back to the step 460 for continuedrecording of a fragment; otherwise, processing proceeds to a test step475, where it is determined whether the meeting is over. If not,processing proceeds back to the step 425 for continued speakeridentification and recording of the meeting by audio fragments;otherwise, processing proceeds to a step 480 where the system attemptsfiltering out double-talk background from principal fragments or splitdouble-talk into speaker channels, as explained elsewhere herein (seeFIG. 2 and the accompanying text).

After the step 480, processing proceeds to a step 485, where certainfragments may be optionally transcribed to text, as explained elsewhereherein, in particular, in conjunction with FIG. 3. After the step 485,processing proceeds to a step 490, where voice annotations mayoptionally be added by meeting participants or other user of the system,as explained elsewhere herein. After the step 490, processing proceedsto a step 495, where audio storyboards of the meeting are compiled anddistributed, as explained elsewhere herein. After the step 495,processing is complete.

Various embodiments discussed herein may be combined with each other inappropriate combinations in connection with the system described herein.Additionally, in some instances, the order of steps in the flowcharts,flow diagrams and/or described flow processing may be modified, whereappropriate. Subsequently, elements and areas of screen described inscreen layouts may vary from the illustrations presented herein.Further, various aspects of the system described herein may beimplemented using software, hardware, a combination of software andhardware and/or other computer-implemented modules or devices having thedescribed features and performing the described functions. Smartphonesfunctioning as audio recording devices may include software that ispre-loaded with the device, installed from an app store, installed froma desktop (after possibly being pre-loaded thereon), installed frommedia such as a CD, DVD, etc., and/or downloaded from a Web site. Suchsmartphones may use operating system(s) selected from the groupconsisting of iOS, Android OS, Windows Phone OS, Blackberry OS andmobile versions of Linux OS.

Software implementations of the system described herein may includeexecutable code that is stored in a computer readable medium andexecuted by one or more processors. The computer readable medium may benon-transitory and include a computer hard drive, ROM, RAM, flashmemory, portable computer storage media such as a CD-ROM, a DVD-ROM, aflash drive, an SD card and/or other drive with, for example, auniversal serial bus (USB) interface, and/or any other appropriatetangible or non-transitory computer readable medium or computer memoryon which executable code may be stored and executed by a processor. Thesoftware may be bundled (pre-loaded), installed from an app store ordownloaded from a location of a network operator. The system describedherein may be used in connection with any appropriate operating system.

Other embodiments of the invention will be apparent to those skilled inthe art from a consideration of the specification or practice of theinvention disclosed herein. It is intended that the specification andexamples be considered as exemplary only, with the true scope and spiritof the invention being indicated by the following claims.

What is claimed is:
 1. A method of recording audio information from ameeting, the method comprising: executing a meeting managementapplication, including establishing a plurality of connections with aplurality of audio input devices configured to record audio data;receiving a plurality of audio streams via the plurality of connectionsduring a meeting, the plurality of audio streams including a first audiostream; associating the first audio stream with a first participant;identifying a first audio fragment from the first audio stream;transcribing the first audio fragment to first textual content; andcompiling the plurality of audio streams into a storyboard of themeeting, the storyboard including at least the first textual content ofthe first audio fragment to be displayed in association with the firstparticipant.
 2. The method of claim 1, further comprising: extracting avoice profile from a first audio input device associated with the firstparticipant, wherein the first audio fragment is transcribed based onthe voice profile.
 3. The method of claim 1, further comprising:distributing the storyboard of the meeting to a subset of the pluralityof audio input devices.
 4. The method of claim 1, wherein the storyboardof the meeting is organized in a chronological order of audio fragments.5. The method of claim 1, wherein the storyboard of the meeting furtherincludes a plurality of audio fragments.
 6. The method of claim 1,wherein the storyboard of the meeting further includes at least one ofvoice annotations and pre-recorded introductions of each activeparticipant.
 7. The method of claim 1, further comprising: receiving oneor more user inputs to edit the storyboard; and in response to the oneor more user inputs to edit the storyboard, performing an actionincluding one or more of: emphasizing the first audio fragmentassociated with the first participant; deemphasizing the first audiofragment associated with the first participant; grouping a plurality ofaudio fragments in the storyboard by topic; grouping audio fragmentsassociated with the first participant in the storyboard; and adding newaudio fragments to the storyboard.
 8. The method of claim 1, furthercomprising: receiving an additional user voice input annotating thefirst audio fragment, wherein the first audio fragment is automaticallyidentified from the first audio stream in response to the additionaluser voice input.
 9. The method of claim 1, further comprising:receiving an additional user input annotating the first audio fragment,wherein the first audio fragment is automatically identified from thefirst audio stream in response to the additional user input.
 10. Anelectronic device, comprising: one or more processors; and memorystoring one or more programs for execution by the one or moreprocessors, the one or more programs including instructions for:executing a meeting management application, including establishing aplurality of connections with a plurality of audio input devicesconfigured to record audio data; receiving a plurality of audio streamsvia the plurality of connections during a meeting, the plurality ofaudio streams including a first audio stream; associating the firstaudio stream with a first participant; identifying a first audiofragment from the first audio stream; transcribing the first audiofragment to first textual content; and compiling the plurality of audiostreams into a storyboard of the meeting, the storyboard including atleast the first textual content of the first audio fragment to bedisplayed in association with the first participant.
 11. The electronicdevice of claim 10, wherein the storyboard of the meeting is organizedin a chronological order.
 12. The electronic device of claim 10, whereinthe storyboard of the meeting further includes at least one of voiceannotations and pre-recorded introductions of each active participant.13. The electronic device of claim 10, wherein the first audio stream isrecorded by a first audio input device, and a visual signal is providedon the first audio input device to request feedback from the firstparticipant associated with the first audio input device.
 14. Theelectronic device of claim 13, wherein the first participant responds tothe visual signal provided on the first audio input device to confirmwhether the first participant is currently speaking.
 15. Anon-transitory computer-readable medium storing one or more programsconfigured for execution by a system, the one or more programs includinginstructions for: executing a meeting management application, includingestablishing a plurality of connections with a plurality of audio inputdevices configured to record audio data; receiving a plurality of audiostreams via the plurality of connections during a meeting, the pluralityof audio streams including a first audio stream; associating the firstaudio stream with a first participant; identifying a first audiofragment from the first audio stream; transcribing the first audiofragment to first textual content; and compiling the plurality of audiostreams into a storyboard of the meeting, the storyboard including atleast the first textual content of the first audio fragment to bedisplayed in association with the first participant.
 16. Thenon-transitory computer-readable medium of claim 15, the one or moreprograms further comprising instructions for: extracting a voice profilefrom a first audio input device associated with the first participant,wherein the first audio fragment is transcribed based on the voiceprofile.
 17. The non-transitory computer-readable medium of claim 15,the one or more programs further comprising instructions for:distributing the storyboard of the meeting to a subset of the pluralityof audio input devices.
 18. The non-transitory computer-readable mediumof claim 15, the plurality of audio streams including a second audiostream, the one or more programs further comprising instructions for:identifying from the second audio stream a second audio fragment,wherein the storyboard includes the second audio fragment; and providingthe storyboard to one or more recipients, wherein the storyboardidentifies the transcribed first audio fragment as corresponding to thefirst participant and the second audio fragment as corresponding to asecond participant.
 19. The non-transitory computer-readable medium ofclaim 18, wherein providing the storyboard to the one or more recipientsincludes replaying the second audio fragment to the one or morerecipients.
 20. The non-transitory computer-readable medium of claim 18,the one or more programs further comprising instructions for:maintaining the first audio fragment in a first audio channel associatedwith the first participant; and maintaining the second audio fragment ina second audio channel associated with the second participant, the firstand second audio channels being separate from each other.